This notebook to explain the code from Fizz Buzz in Tensor Flow blog post written by Joel Grus
You should read his post first it is super funny!
His code try to play the Fizz Buzz game by using machine learning.
This notebook is for real beginners who whant to understand the basis of TensorFlow by reading code.
Feedback welcome @dh7net
The code contain several part:
In [1]:
import numpy as np
import tensorflow as tf
In [2]:
NUM_DIGITS = 10
def binary_encode(i, num_digits):
return np.array([i >> d & 1 for d in range(num_digits)])
#Let's check if it works
for i in range(10):
print i, binary_encode(i, NUM_DIGITS)
In [3]:
def fizz_buzz_encode(i):
if i % 15 == 0: return np.array([0, 0, 0, 1])
elif i % 5 == 0: return np.array([0, 0, 1, 0])
elif i % 3 == 0: return np.array([0, 1, 0, 0])
else: return np.array([1, 0, 0, 0])
def fizz_buzz(i, prediction):
return [str(i), "fizz", "buzz", "fizzbuzz"][prediction]
# let'see how the encoding works
for i in range(1, 16):
print i, fizz_buzz_encode(i)
In [4]:
# and the decoding
for i in range(1, 16):
fizz_or_buzz_number = np.argmax(fizz_buzz_encode(i))
print i, fizz_or_buzz_number, fizz_buzz(i, fizz_or_buzz_number)
In [5]:
training_size = 2 ** NUM_DIGITS
print "Size of the set:", training_size
trX = np.array([binary_encode(i, NUM_DIGITS) for i in range(101, training_size)])
trY = np.array([fizz_buzz_encode(i) for i in range(101, training_size)])
print "First 15 values:"
for i in range(101, 116):
print i, trX[i], trY[i]
The model is made of:
The input is fully connected to the hidden layer and a relu function is applyed
The relu function is a rectifier that just output zero if the input is negative.
First we'll define an helper function to initialise parameters with randoms values
In [6]:
def init_weights(shape):
return tf.Variable(tf.random_normal(shape, stddev=0.01))
X is the input
Y is the output
w_h are the parameters between the input and the hidden layer
w_o are the parameters between the hidden layer and the output
In [7]:
NUM_HIDDEN = 100 #Number of neuron in the hidden layer
X = tf.placeholder("float", [None, NUM_DIGITS])
Y = tf.placeholder("float", [None, 4])
w_h = init_weights([NUM_DIGITS, NUM_HIDDEN])
w_o = init_weights([NUM_HIDDEN, 4])
To create the model we apply the w_h parameters to the input,
and then we aply the relu function to calculate the value of the hidden layer.
The w_o coeefient are used to calculate the output layer. No rectification is applyed
py_x is the predicted value for a given input represented as a vector (dimention 4)
In [8]:
def model(X, w_h, w_o):
h = tf.nn.relu(tf.matmul(X, w_h))
return tf.matmul(h, w_o)
py_x = model(X, w_h, w_o)
In [9]:
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(py_x, Y))
Training a model in TensorFlow is extremly simple, you just define a trainer operator!
In [10]:
train_op = tf.train.GradientDescentOptimizer(0.05).minimize(cost)
This operator will minimize the cost using the Gradient Descent witch is the most common optimizer to find parameters than will minimise the cost.
We'll also define a prediction operator that will be able to output a prediction.
In [11]:
predict_op = tf.argmax(py_x, 1)
One epoch consists of one full training cycle on the training set. Once every sample in the set is seen, you start again - marking the beginning of the 2nd epoch. source
The training set is randomly permuted between each epoch.
The learning is not done on the full set at once.
Instead the learning set is divided in small batch and the learning is done for each of them.
In [12]:
BATCH_SIZE = 128
Here an example of index used for one epoch:
In [13]:
#random permutation of the index will be used during the training for each epoch
permutation_index = np.random.permutation(range(len(trX)))
for start in range(0, len(trX), BATCH_SIZE):
end = start + BATCH_SIZE
print "Batch starting at", start
print permutation_index[start:end]
In [14]:
# Launch the graph in a session
sess = tf.Session()
tf.initialize_all_variables().run(session=sess)
for epoch in range(5000):
# Shuffle the data before each training iteration.
p = np.random.permutation(range(len(trX)))
trX, trY = trX[p], trY[p]
# Train in batches of 128 inputs.
for start in range(0, len(trX), BATCH_SIZE):
end = start + BATCH_SIZE
sess.run(train_op, feed_dict={X: trX[start:end], Y: trY[start:end]})
# And print the current accuracy on the training data.
if (epoch%100==0): # each 100 epoch, to not overflow the jupyter log
# np.mean(A==B) return a number between 0 and 1. (true_count/total_count)
print(epoch, np.mean(np.argmax(trY, axis=1) ==
sess.run(predict_op, feed_dict={X: trX, Y: trY})))
In [15]:
# And now for some fizz buzz
numbers = np.arange(1, 101)
teX = np.transpose(binary_encode(numbers, NUM_DIGITS))
teY = sess.run(predict_op, feed_dict={X: teX})
output = np.vectorize(fizz_buzz)(numbers, teY)
print output
In [16]:
sess.close() # don't forget to close the session if you don't use it anymore. Or use the *with* statement.
In [17]:
# Lets check the quality
Y = np.array([fizz_buzz_encode(i) for i in range(1,101)])
print "accuracy", np.mean(np.argmax(Y, axis=1) == teY)
for i in range(1,100):
actual = fizz_buzz(i, np.argmax(fizz_buzz_encode(i)))
predicted = output[i-1]
ok = True
if actual <> predicted: ok = False
print i, "{:>8}".format(actual), "{:>8}".format(predicted), ok